Joint environment and speaker normalization using factored front-end CMLLR

نویسندگان

Shakti Rath

Sunil Sivadas

Bin Ma

چکیده

The problem of joint compensation of environment and speaker variabilities is addressed. A factored feature-space transform, named factored front-end CMLLR (F-FE-CMLLR), is investigated, which comprises of the cascade of two transforms – front-end CMLLR for environment normalization and CMLLR for speaker normalization. In this paper, we propose an iterative estimation algorithm for F-FE-CMLLR. We believe that the iterative estimation helps to decouple the effect of the two acoustic factors, allowing each transform to learn the effect of only factor, thereby yielding an improvement in speech recognition performance compared to sequential estimation. However, it is noted that the estimation of environment transform yields full co-variance Gaussians in the GMM-HMM, which makes direct estimation computationally expensive. An efficient training algorithm is presented that helps to reduce the computational cost considerably. Further, it is shown that a row-by-row optimization procedure can be employed, which makes the algorithm more efficient and attractive. On the multi-condition Aurora 4 task and discriminatively trained GMM-HMM, it is shown that F-FE-CMLLR yields 11.6% and 8.7% relative improvements on two evaluation sets over the baseline features that is processed only by CMLLR for speaker normalization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Separating Speaker and Environmental Variability Using Factored Transforms

Two primary sources of variability that degrade accuracy in speech recognition systems are the speaker and the environment. While many algorithms for speaker or environment adaptation have been proposed to improve performance, far less attention has been paid to approaches which address for both factors. In this paper, we present a method for compensating for speaker and environmental mismatch ...

متن کامل

Model-Based Approaches for Degraded Channel Modelling in Robust ASR

Speech is usually observed after passing through some form of “channel” that results in distortions. For some scenarios it is possible to build explicit models of this channel distortion and hence compensate the acoustic models. However the accuracy of the distortion model is sometimes poor and more general adaptation approaches are required. This paper investigates these model-based approaches...

متن کامل

Adaptive Training Using Simple Target Models

Adaptive training aims at reducing the influence of speaker, channel and environment variability on the acoustic models. We describe an acoustic normalization approach to adaptive training. Phonetically irrelevant acoustic variability is reduced at the beginning of the training procedure w. r. t. a set of target models. The set of target models can be a set of HMMs or a Gaussian mixture model (...

متن کامل

MLLR techniques for speaker recognition

Maximum-Likelihood Linear Regression (MLLR) and Constrained MLLR (CMLLR) have been recently used for feature extraction in speaker recognition. These systems use (C)MLLR transforms as features that are modeled with Support Vector Machines (SVM). This paper evaluates and compares several of these approaches for the NIST Speaker Recognition task. Single CMLLR and up to 4-phonetic-class MLLR trans...

متن کامل

Feature Level Compensation for Robust Speaker Identification in Mismatched Conditions

In this paper, robust front end features are proposed for improvement in speaker identification (SI) performance by considering the factors of real world situations, like mismatch between training and testing conditions. The most commonly used MFCC features are very much sensitive to effects such as channel and environment mismatch. Characteristics of speech gets changed with room acoustics, ch...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Joint environment and speaker normalization using factored front-end CMLLR

نویسندگان

چکیده

منابع مشابه

Separating Speaker and Environmental Variability Using Factored Transforms

Model-Based Approaches for Degraded Channel Modelling in Robust ASR

Adaptive Training Using Simple Target Models

MLLR techniques for speaker recognition

Feature Level Compensation for Robust Speaker Identification in Mismatched Conditions

عنوان ژورنال:

اشتراک گذاری